Identifying Cognate Sets Across Dictionaries of Related Languages

نویسندگان

  • Adam St. Arnaud
  • David Beck
  • Grzegorz Kondrak
چکیده

We present a system for identifying cognate sets across dictionaries of related languages. The likelihood of a cognate relationship is calculated on the basis of a rich set of features that capture both phonetic and semantic similarity, as well as the presence of regular sound correspondences. The similarity scores are used to cluster words from different languages that may originate from a common protoword. When tested on the Algonquian language family, our system detects 63% of cognate sets while maintaining cluster purity of 70%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On multiword lexical units and their role in maritime dictionaries

Multi-word lexical units are a typical feature of specialized dictionaries, in particular monolingual and bilingual maritime dictionaries. The paper studies the concept of the multi-word lexical unit and considers the similarities and differences of their selection and presentation in monolingual and bilingual maritime dictionaries. The work analyses such issues as the classification of multi-w...

متن کامل

Obtaining SMT dictionaries for related languages

This study explores methods for developing Machine Translation dictionaries on the basis of word frequency lists coming from comparable corpora. We investigate (1) various methods to measure the similarity of cognates between related languages, (2) detection and removal of noisy cognate translations using SVM ranking. We show preliminary results on several Romance and Slavonic languages.

متن کامل

Constraint-Based Bilingual Lexicon Induction for Closely Related Languages

The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction becomes a difficult task for low-resource languages. Pivot language and cognate recognition approach have been proven useful to induce bilingual lexicons for such languages. We analyze the features of closely related languages and define a semantic constraint assumption. Based on the assumption, we propose...

متن کامل

Effect of Cognate-Based Instruction Strategy on Vocabulary Learning Among Iranian EFL Learners

Cognates are the words celebrating their similarities from phonetic, orthographic, and semantic points of view across two or more languages. The aim of the present study was to investigate the effect of cognate-based instruction strategy on vocabulary learning among Iranian EFL learners. To achieve the goal of the study, 80 EFL learners (15-27 years old) took part in the study; all of them were...

متن کامل

Effect of Cognate-Based Instruction Strategy on Vocabulary Learning Among Iranian EFL Learners

Cognates are the words celebrating their similarities from phonetic, orthographic, and semantic points of view across two or more languages. The aim of the present study was to investigate the effect of cognate-based instruction strategy on vocabulary learning among Iranian EFL learners. To achieve the goal of the study, 80 EFL learners (15-27 years old) took part in the study; all of them were...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017